Ant Swarm Reinforcement Learning for Formulating Online Promotion Strategies

نویسندگان

  • Tuck Siong Chung
  • P. K. Kannan
چکیده

The emergence of the online channel has rendered the retail environment more dynamic than it has ever been before. The rapid developments in technology are allowing more variations of products and services online, thereby expanding the product/service line. Customers’ preferences have also become more dynamic, partly as a result of the proliferation in products and services. In addition, the low entry barrier for competition in many categories has also contributed significantly to the dynamics. In such a dynamic environment, the task of inferring customer preferences and/or responding to customer preferences in terms of appropriate actions such as product design, pricing and promotion is becoming difficult for online/multi-channel retailers. Typically, many such strategies are developed using a static framework wherein data is usually collected first, then analyzed, and then appropriate strategies implemented to optimally impact the market. For example, customer equity studies based on which “best” customers are identified, promotional strategies to impact the customer base optimally, and mass customization strategies – all are based on analyzing data in a static framework and implementing the resultant strategies with the assumption that the underlying preference patterns and market conditions have not changed significantly. However, if the underlying conditions are dynamic, such strategies do not achieve their intended results. In dynamic environments changes in the preferences and response patterns have to be learned over time (see Sutton and Barto, 2000) and tracked in order to formulate appropriate strategies. In this paper, we propose a reinforcement learning approach based on the model of ant swarms (Dorigo and Di Caro, 1999). We focus on an online retailer who has various promotional options to induce customers to purchase items from its Web site. The promotional options include straight price discounts, incentives for cumulative purchasing (loyalty programs), e-mail based coupons, banner ad discounts, and free shipping offers. The problem the retailer faces is: which promotional tools will be the optimum for the different segments of customers it attracts in terms of increasing the purchase probability? One possible way to learn about the effectiveness of these tools is to present these options one-at-a-time and determine each tool’s impact. However, each customer’s reaction may depend on what state he/she is in, in addition to their underlying difference. For example, a customer who likes coupon promotion may not purchase anything from the online retailer if his/her time since last purchase is short, whereas the same customer may buy if the time since last purchase is long. While mathematical models can still be build to estimate the impact of such variables on purchase, if the underlying preference (coupon-proneness) changes over time, the results of the model may not be valid. In such cases, the learning has to continuous and reinforcing to track the changes in the underlying dimensions of the market. The general idea behind our reinforcement learning model is as follows. The focus is to map a situation to action. This is accomplished by trial-and-error search to obtain a delayed reward. The four main elements our model considers include a policy, a reward function, a value function and a model of the environment. The model of the environment mimics the behavior of the online environment. Given a state and action, the model predicts the resultant next state and next reward. While the different promotional tools are possible actions, each consumer could be in a different state depending on whether the customer has been offered promotion previously, his/her response to it, time since last purchase, basket size of last purchase, and so on. We model the trial-and-error search phase using an ant swarm approach (Gutjahr, 2000; Chang 2004). Just as a swarm of ants starts in multiple directions (parallel search) in search of a reward, our approach will start with a swarm of actions over a selected number of customers to learn about their response. The longer this phase lasts, the better the learning will be; however, there is danger given that the environment itself is dynamic, spending a long time learning may be useless from a rewards viewpoint if the environment changes. Thus, it is better to learn and start implementing the strategy based on the learning quickly so that one can exploit the learning. We examine the application of the ant swarm model to the online promotion problem using simulation. In Case 1, we let the underlying preferences to be static and examine the efficiency of learning the efficacy of the promotional tools as a function of learning time (and data collected). In Case 2, we let the underlying preferences to be dynamic and examine the efficiency of the learning. In addition, we also examine the tension between “exploring” and “exploiting”. Finally, we describe how the learning model can be used in practice in setting online promotional strategies. We will be presenting the conceptual model, reinforcement model set-up and the results of the preliminary investigation at the conference.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Model for Foraging Ants, Controlled by Spiking Neural Networks and Double Pheromones

A model of an Ant System where ants are controlled by a spiking neural circuit and a second order pheromone mechanism in a foraging task is presented. A neural circuit is trained for individual ants and subsequently the ants are exposed to a virtual environment where a swarm of ants performed a resource foraging task. The model comprises an associative and unsupervised learning strategy for the...

متن کامل

Toward Evolving Neural Networks using Bio-Inspired Algorithms

The SWarm Intelligence-based Reinforcement Learning (SWIRL) method is proposed in this paper to efficiently generate Artificial Neural Network (ANN) based solutions to various problems. Basically, two swarm intelligence based algorithms are combined together in SWIRL to train the ANN models. Ant Colony Optimization (ACO) is applied to optimize ANN topology, while Particle Swarm Optimization (PS...

متن کامل

Evolutionary Dynamics of Ant Colony Optimization

Swarm intelligence has been successfully applied in various domains, e.g., path planning, resource allocation and data mining. Despite its wide use, a theoretical framework in which the behavior of swarm intelligence can be formally understood is still lacking. This article starts by formally deriving the evolutionary dynamics of ant colony optimization, an important swarm intelligence algorith...

متن کامل

An Energy Efficient Routing Based on Swarm Intelligence for Wireless Sensor Networks

Wireless Sensor Networks are characterized by having specific requirements such as limited power, memory and functionality to support communications. In sensor networks, minimization of energy consumption is considered a major performance criterion to provide maximum network lifetime. Traditional routing protocols do not take into account that a node contains only a limited energy supply. In th...

متن کامل

Reinforcement learning produces dominant strategies for the Iterated Prisoner’s Dilemma

We present tournament results and several powerful strategies for the Iterated Prisoner's Dilemma created using reinforcement learning techniques (evolutionary and particle swarm algorithms). These strategies are trained to perform well against a corpus of over 170 distinct opponents, including many well-known and classic strategies. All the trained strategies win standard tournaments against t...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006